{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## A Practical Application of Counting & Likelihood Estimation\n",
    "#### by me, Fiona Pigott"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Problem: Find the most Retweeted Tweet \n",
    "\n",
    "### Given: set of Retweets of various original Tweets\n",
    "We want to use a collected sample of all Retweets to estimate which Tweet was Retweeted the most. We can count up all of the Retweets in our sample to estimate the ordering of the Tweets, but we can't count every Retweet to get the exact number. How big of a sample do we need to be relatively certain that we've correctly identified the most Retweeted Tweet?\n",
    "\n",
    "#### Tools that we have:\n",
    "- PowerTrack with the sampling operator\n",
    " - PowerTrack sampling works by flipping a biased coin for each Tweet to decide whether or not to include it in the sample: if the sample is 10%, each Tweet has an independent 10% chance of being included in the sample.\n",
    " - We can select only Retweets using a PowerTrack operator\n",
    "- More information on PowerTrack: http://support.gnip.com/apis/powertrack2.0/rules.html\n",
    "\n",
    "#### What we need to know:\n",
    "What's the probability that, for some sampling factor (using PowerTrack sampling) I get this ordering wrong (i.e., I mistakenly identify some Tweet as the most Retweeted Tweet).\n",
    "\n",
    "### Problem statement:\n",
    "- Let the Tweet most Retweeted in the set have a total of $\\tau_1$ Retweets\n",
    "- Let the 2nd most Retweeted in the set have a total of $\\tau_2$ Retweets\n",
    "- Take a sample of the whole set of Retweets, including each individual Retweet with a probability $s$ and excluding it with a probability $(1-s)$. Note that the inclusion of any given Retweet is independent of whether or not any other Retweet is included.\n",
    "- Call the number of occurrences of Retweets of the most Retweeted and 2nd most Retweeted Tweets in the sample $T_1$ and $T_2$ respectively, and the \n",
    "\n",
    "### Some probability vocabulary:\n",
    "A **random variable**: \"A variable quantity whose value depends on possible outcomes.\" In this case, $T_1$ and $T_2$ are \"random variables\"  \n",
    "An **event**: A measurable outcome of the experiment (i.e., \"$T_1 = t_1$,\" where $t_1$ is some specific number of Retweets in the sample)  \n",
    "**Independent events**: Two (or more) events whose outcomes do not affect eachother. In this example, all of our events are pretty much independent of eachother.  \n",
    "A **PMF** or **probabilty mass function**: For a discrete probabilty distribution (in this case, we're dealing with a discrete distribution, because we cannot collect a sample of 10.5 Tweets), the probability mass function is a function that gives the probability of each possible event, and the sum over all possible events is 1.   \n",
    "A **joint PMF**: A function that gives the probabilty of two events occurring together. For two independent events, we can just multiply the two individual PMFs together. Easy.\n",
    "\n",
    "\n",
    "### And now! Counting.\n",
    "#### Counting probabilities of $t_1$ and $t_2$ occurences:\n",
    "The probability mass function of $T_1$ is: \n",
    "The probabilty of selecting $t_1$ Tweets, so $s^{t_1}$ times the probabilty of not selecting $\\tau_1 - t_1$ Tweets, times the number of different ways you could select $t_1$ Tweets, ${\\tau_1\\choose t_1}$:\n",
    "$$ P(T_1 = t_1) = {\\tau_1\\choose t_1} s^{t_1}(1-s)^{(\\tau_1 - t_1)}$$\n",
    "Similarly, the probability mass function of $T_2$ is:\n",
    "$$ P(T_2 = t_2) = {\\tau_2\\choose t_2} s^{t_2}(1-s)^{(\\tau_2 - t_2)}$$\n",
    "\n",
    "#### This distribution might look familiar: it's called the binomial distribution\n",
    " > \"In probability theory and statistics, the binomial distribution with parameters n and p is the discrete probability distribution of the number of successes in a sequence of n independent experiments, each asking a yes–no question, and each with its own boolean-valued outcome\"   \n",
    " > _from Wikipedia_\n",
    " $$ B(n,p) = \t\\textstyle {n \\choose k}\\,p^{k}(1-p)^{n-k} $$\n",
    "\n",
    "\n",
    "#### The probability of exactly $t_1$ and exactly $t_2$ occurrences (joint PMF):\n",
    "$$ {\\tau_1\\choose t_1} s^{t_1}(1-s)^{(\\tau_1 - t_1)} {\\tau_2\\choose t_2} s^{t_2}(1-s)^{(\\tau_2 - t_2)} = $$\n",
    "$$ {\\tau_1\\choose t_1}{\\tau_2\\choose t_2} s^{t_1 + t_2} (1-s)^{\\tau_1 + \\tau_2 - t_1 - t_2} $$"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "# math\n",
    "import numpy as np\n",
    "from scipy.misc import comb\n",
    "from scipy.stats import binom, norm\n",
    "from itertools import accumulate\n",
    "from math import sqrt, ceil, log\n",
    "# python tools\n",
    "from collections import Counter\n",
    "from itertools import zip_longest\n",
    "from multiprocessing import Process, Pool, Queue\n",
    "from random import random\n",
    "import os\n",
    "# plotting\n",
    "import matplotlib.pyplot as plt\n",
    "%matplotlib inline"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Getting a feel for it:  What's the probability that $t_2 \\geq t_1$ given that $\\tau_1 > \\tau_2$?\n",
    "$$ \\sum_{j = 0}^{\\tau_2}\\sum_{i = j}^{\\tau_2} {\\tau_1\\choose j}{\\tau_2\\choose i} s^{j + i} (1-s)^{\\tau_1 + \\tau_2 - j - i}$$\n",
    "Note: $t_1 \\rightarrow j$, $t_2 \\rightarrow i$"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Before we start with the joint probabilty, let's just look at how $t_1$ is distributed:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "# Set up the problem. We have to fix tau_1, tau_2, and s\n",
    "tau1 = 100\n",
    "tau2 = 75\n",
    "s = .1\n",
    "\n",
    "# We can use the binomial PMF from scipy, which is better than coding it up myself \n",
    "# because all of those factorials make for hard computations\n",
    "# we're gonna use these a lot\n",
    "binomial_tau1 = binom(tau1,s)\n",
    "binomial_tau2 = binom(tau2,s)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "# Let's show how likely it is to get exactly t_1 of T_1. Just for practice\n",
    "\n",
    "# do the probability calculations\n",
    "probabilities_t1 = []\n",
    "for i in range(0,tau1):\n",
    "    #probabilities_t1.append((i,comb(tau1,i)*s**(i)*(1-s)**(tau1-i)))\n",
    "    probabilities_t1.append((i,binomial_tau1.pmf(i)))\n",
    "probabilities_t2 = []\n",
    "for i in range(0,tau2):\n",
    "    #probabilities_t2.append((i,comb(tau2,i)*s**(i)*(1-s)**(tau2-i)))\n",
    "    probabilities_t2.append((i,binomial_tau2.pmf(i)))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "plt.plot([x[0] for x in probabilities_t1],[x[1] for x in probabilities_t1])\n",
    "plt.title(\"Probability Mass Function for the Binomial distribution\")\n",
    "plt.xlabel(\"Number of successes in tau_1 trials (t_1)\")\n",
    "plt.ylabel(\"Probability of exactly t_1 successes\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### How likely is it that we get _exactly_ X% of $\\tau_1$? \n",
    "$$ P(t_1 = s\\tau_1)  = {\\tau_1\\choose s\\tau_1} s^{t_1}(1-s)^{(\\tau_1 - s\\tau_1)}$$"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "print(\"The probability of getting exactly s*tau_1 samples is: {:f}\".format(binomial_tau1.pmf(s*tau1)))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### How likely is it that we get $t_1$ samples, with $t_1$ on some interval? Say, $s\\tau_1 \\pm$ 1%?\n",
    "$$ P(.99s\\tau_1 < t_1 < 1.01s\\tau_1)   = \\sum_{i = .99s\\tau_1}^{1.01s\\tau_1}{\\tau_1\\choose i} s^{i}(1-s)^{(\\tau_1 - i)} $$"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "plus_or_minus_percent = .1\n",
    "lower_bound_t1 = int((1 - plus_or_minus_percent)*s*tau1)\n",
    "upper_bound_t1 = int((1 + plus_or_minus_percent)*s*tau1)\n",
    "probability_t1_interval = 0\n",
    "for i in range(lower_bound_t1,upper_bound_t1):\n",
    "    probability_t1_interval += binomial_tau1.pmf(i)\n",
    "print(\"P({} <= t_1 < {}) = {:f}\".format(lower_bound_t1,upper_bound_t1,probability_t1_interval))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Standard deviation of the size of the sample\n",
    "\n",
    "It's relatively simple to work out the standard deviation of the Binomial distribution, if we remember a few things about expected values (averages), standard deviation and variance:  \n",
    "1. Expected value is a linear operator (i.e., $E[A + 2B] = E[A] + 2E[B]$)  \n",
    "2. We'll call $E[X] = \\mu$ \n",
    "3. Variance: $Var[X] = E[(X - \\mu)^2] = E[X^2] - E[X]^2$\n",
    "4. If two random variables are uncorrelated, Variance is a linear operator: $Var[\\sum_i X_i] = \\sum_i Var[X_i]$\n",
    "5. Standard deviation: $\\sqrt{Var[X]}$  \n",
    "\n",
    "And remember one thing about the Binomial distribution:\n",
    " > \"In probability theory and statistics, the binomial distribution with parameters n and p is the discrete probability distribution of the number of successes in **a sequence of n independent experiments (Bernoulli trials)**, each asking a yes–no question, and each with its own boolean-valued outcome\"   \n",
    " > _from Wikipedia_\n",
    "\n",
    "I'm not going to work this out in a lot of detail here, I'll simply say that the expected value ($E[X]$) of a single Bernoulli trial (i.e., for one Tweet, decide whether it is in the sample or outside of the sample) is equal to $s$, and the variance of that single trial is equal to $Var[X] = s(1-s)$.  \n",
    "\n",
    "\n",
    "Then, if the total number of Tweets in the sample is simply the sum of these independent Bernoulli trials, the expected value of $t_1$ is $E[t_1] = s\\tau_1$ and the variance is $Var[t_1] = \\tau_1s(1-s)$\n",
    "\n",
    "The standard deviation is then: $\\sqrt{\\tau_1s(1-s)} $"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "# standard deviation varies as the size of tau1 varies\n",
    "std_dev_sizes = []\n",
    "max_tau = 10**4\n",
    "for i in range(0,max_tau,5):\n",
    "    std_dev_sizes.append((i,sqrt(i*s*(1-s))))\n",
    "plt.plot([x[0] for x in std_dev_sizes],[x[1] for x in std_dev_sizes])\n",
    "_ = plt.xlim([0,max_tau])\n",
    "_ = plt.ylim([0,max([x[1] for x in std_dev_sizes])])\n",
    "_ = plt.title(\"Standard deviation of the sample size, varying tau\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Using the standard deviation to estimate the most likely interval for $t_1$ to fall in:\n",
    "Now, I'm not going to work this out explicitly because there are other fish I'd like to fry, but know that as the number of trials increases, the Binomial distribution tends towards the normal distribution.\n",
    "\n",
    "That means we can use the same rule of thumb to estime how much of the distribution falls within, say, 3 standard deviations of the mean that we use for normal distributions (that is, 99.7% of all of the probabilty mass falls within 3 standard deviations of the mean).\n",
    "\n",
    "I'm going to use a plot to try to get you to beleive me:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "# Let's show how likely it is to get exactly t_1 of T_1. Just for practice\n",
    "# we cen grab the \n",
    "normal_distribution_tau1 = norm(tau1*s,sqrt(tau1*s*(1-s)))\n",
    "normal_prob_t1 = []\n",
    "probabilities_t1 = []\n",
    "for i in range(0,tau1):\n",
    "    probabilities_t1.append((i,binomial_tau1.pmf(i)))\n",
    "    normal_prob_t1.append((i,normal_distribution_tau1.pdf(i)))\n",
    "plt.plot([x[0] for x in probabilities_t1],[x[1] for x in probabilities_t1],linewidth = 5)\n",
    "plt.plot([x[0] for x in normal_prob_t1],[x[1] for x in normal_prob_t1],\"--\",linewidth = 3)\n",
    "plt.legend([\"binomial\",\"normal\"])\n",
    "plt.ylabel(\"Probabilty mass a x\")\n",
    "plt.xlabel(\"t_1\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "# the total probability of falling within 3 standard deviations of the mean (s*tau_1):\n",
    "plus_or_minus_stdv = .01\n",
    "lower_bound_t1 = max(0,int(s*tau1 - 3 * sqrt(tau1*s*(1-s))))\n",
    "upper_bound_t1 = int(s*tau1 + 3 * sqrt(tau1*s*(1-s)))\n",
    "probability_t1_interval = 0\n",
    "for i in range(lower_bound_t1,upper_bound_t1+1):\n",
    "    probability_t1_interval += binomial_tau1.pmf(i)\n",
    "print(\"P({} < t_1 < {}) = {:f} (very nearly 99.7%)\".format(lower_bound_t1,upper_bound_t1,probability_t1_interval))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Now let's look at $t_1$ and $t_2$ together:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "# t1 and t2 most likely fall within some range\n",
    "# t1 and t2 are selected form binomial distributions, they are symmetric \n",
    "# and their standard deviations are easily calculated\n",
    "fig, ax = plt.subplots(1, 1)\n",
    "ax.plot([x[0] for x in probabilities_t1],[x[1] for x in probabilities_t1])\n",
    "ax.plot([x[0] for x in probabilities_t2],[x[1] for x in probabilities_t2])\n",
    "ax.legend([\"t_1\",\"t_2\"])\n",
    "ax.set_ylabel(\"Probabilty mass at x\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Finally, the joint PMF:\n",
    "$$ {\\tau_1\\choose t_1} s^{t_1}(1-s)^{(\\tau_1 - t_1)} {\\tau_2\\choose t_2} s^{t_2}(1-s)^{(\\tau_2 - t_2)}$$\n",
    "\n",
    "#### What's the probability that $t_2 \\geq t_1$ given that $\\tau_1 > \\tau_2$?\n",
    "$$ \\sum_{j = 0}^{\\tau_2}\\sum_{i = j}^{\\tau_2} {\\tau_1\\choose j}{\\tau_2\\choose i} s^{j + i} (1-s)^{\\tau_1 + \\tau_2 - j - i}$$\n",
    "Note: $t_1 \\rightarrow j$, $t_2 \\rightarrow i$"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "# Now let's code this up so we can simulate it for various values of s, tau_1, and tau_2\n",
    "probabilities_t2_and_t1 = np.zeros((tau1,tau2))\n",
    "\n",
    "# i can seriously speed up the amount of time this takes by not calculating the very, very low probabilities\n",
    "# outside of 3 stdev from the mean\n",
    "bool_truncate_window = True\n",
    "\n",
    "if bool_truncate_window:\n",
    "    stdv1 = ceil(sqrt(tau1*s*(1-s)))\n",
    "    stdv2 = ceil(sqrt(tau2*s*(1-s)))\n",
    "    num_stdv = 3\n",
    "    window_t1 = [max(ceil(s*tau1)-num_stdv*stdv1,0), ceil(s*tau1)+num_stdv*stdv1]\n",
    "    window_t2 = [max(ceil(s*tau2)-num_stdv*stdv2,0), ceil(s*tau2)+num_stdv*stdv2]\n",
    "else:\n",
    "    window_t1 = [0,tau1]\n",
    "    window_t2 = [0,tau2]\n",
    "\n",
    "for j in range(window_t1[0],window_t1[1]):\n",
    "    for i in range(window_t2[0],window_t2[1]):\n",
    "        probabilities_t2_and_t1[j,i] = binomial_tau1.pmf(j)*binomial_tau2.pmf(i)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "figure, ax = plt.subplots(1,1,figsize = (6,12))\n",
    "ax.imshow(probabilities_t2_and_t1, cmap='Blues', interpolation='nearest', aspect = 'equal', origin = 'lower')\n",
    "\n",
    "ax.set_xlim([0,tau2])\n",
    "ax.set_ylim([0,tau1])\n",
    "if bool_truncate_window:\n",
    "    ax.fill_between(window_t2,0,tau1,alpha = .2,color = 'navajowhite')\n",
    "    ax.fill_between([0,tau2],window_t1[0],window_t1[1],alpha = .2,color = 'lightsteelblue')\n",
    "ax.plot([0,tau2],[0,tau2],'--', color = 'gray')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "# The total probabilties should sum to 1\n",
    "# because I truncated, they sum to something slightly less than one--you can see how small the actual \n",
    "# values are in those other areas \n",
    "print(\"Total probabilty represented: {:f}\".format(probabilities_t2_and_t1.sum()))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "# Now, sum only the lower triangle--the piece where t_2 >= t_1\n",
    "print(\"The probabilty that t_2 >= t_1 given tau1 = {} and tau2 = {}: {}\".format(\n",
    "    tau1, tau2, np.triu(probabilities_t2_and_t1).sum()))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### That was interesting, but it wasn't exactly the problem that we wanted to answer\n",
    "What we just did assumed that we knew $\\tau_1$ and $\\tau_2$, and we figured out probabilities of getting $t_1$ and $t_2$ given those specific $\\tau$ values. Actually, $\\tau$ is the value that we don't know (the value we're trying to estimate).\n",
    "\n",
    "### We want to know: What's the probability that $\\tau_1$ > $\\tau_2$ given that $t_1 > t_2$? What's the probability that we get our ordering (of the most popular, second most popular, etc Retweets) correct?\n",
    "\n",
    "Basically, we know that the number of Retweets of any given Tweet in our sample is going to follow this Binomial distribution, where $\\tau$ is the number of Retweets in total (the number of Retweets that we would have collected with a 100% sample) and $t$ is:\n",
    "$$ P(T = t) = {\\tau\\choose t} s^{t}(1-s)^{(\\tau - t)}$$\n",
    "We know that, because I've already typed it several times in this notebook. Now, when we actually run this experiment, we know the probability of including any individual Retweet in the sample, that probabilty is $s$, and we set it. We also know the outcome of each experiment--that outcome is the number of Retweets that we actually count in our sample. The parameter of the distribution that we **don't** know is $\\tau$--the total number of Retweets.\n",
    "\n",
    "### Estimating the value of a parameter in a probabilty distribution\n",
    "\n",
    "One often-used and often-referenced way to find the most likely value of a parameter is called \"Maximum Likelihood Estimation,\" and I think that right now would be a good time to talk about Maximum Likelihood Estimators (because it's a generally useful and interesting concept).\n",
    "\n",
    "Now, it turns out that finding a nice, tidy Maximum Likelihood Estimator for $\\tau$ in this distribution is very difficult (you'll see why soon enough), so I'm going to do two things:\n",
    "1. Explain, generally, what MLEs are\n",
    "2. Pretend that I know $\\tau$ and don't know $s$, and demonstrate a simple, clean example of using MLEs to estimate $s$ in this Binomial distribution\n",
    "3. Use numerical Maximum Likelihood Estimation to estimate $\\tau$ (closer to the actual problem I'm trying to solve)\n",
    "\n",
    "#### 1. What is Maximum Likelihood estimation?\n",
    "Maximum Likelihood Estimation uses the outcomes of many iterations of an experiment to guess (estimate) the paramater that maximizes the probabilty (likelihood) of those outcomes. That's a pretty intuitive idea.  \n",
    "For example: If you flip a single coin 100 times, and that coin comes up heads 30 times, you might guess that the probabilty of that coin coming up heads in general is 0.3. That would be a pretty good guess, because the probabilty of getting 30/100 \"heads\" results is maximized if the probabilty of the coins coming up \"heads\" is 0.3.\n",
    "\n",
    "Let's work that out:\n",
    "For any set of independent, identically distributed experiments ($X_i$) and outcomes ($X_i = x_i$), the probabiliy of all of the outcomes occurring at once would simply be the joint PMF of all of the experiments (remember, because they're independent, we can just multiply PMFs together). \n",
    "\n",
    "Call each PMF $P(X_i = x_i; \\theta)$ (where $\"; \\theta\"$ denotes the value of the parameter that we're trying to estimate, which the PMF is dependent on) then the probability (likelihood) of a set of $n$ results is dependent on a independent variable $\\theta$:\n",
    "$$ L(\\theta) = P(X_1 = x_1, X_2 = x_2, X_3 = x_3, \\dots, X_n = x_n; \\theta) = \\prod_{i = 0}^n P(X_i= x_i; \\theta)$$\n",
    "\n",
    "Now, if we find the value of the parameter $\\theta$ that maximizes the likelihood of our results $x_i$ coming from the distribution, we have found the Maximum Likelihood Estimator for $\\theta$. Super.\n",
    "\n",
    "#### 2. A simple example of MLE estimation\n",
    "\n",
    "Right now, we're going to do an example of a Bernoulli distribution (basically, a Binomial distribution with only one trial). Think of the coin toss example again."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "def bernoulli_experiment(s):\n",
    "    if random() < s:\n",
    "        return True\n",
    "    else:\n",
    "        return False\n",
    "\n",
    "def bernoulli_pmf(x, s):\n",
    "    if x:\n",
    "        return s\n",
    "    else:\n",
    "        return 1-s\n",
    "\n",
    "def eval_bernoulli_likelihood_for_all_x_i(x_i, s):\n",
    "    likelihood = 1\n",
    "    for x in x_i:\n",
    "        likelihood = likelihood * bernoulli_pmf(x,s)\n",
    "    return likelihood"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "# set up the experiment\n",
    "# number of trials\n",
    "n = 100\n",
    "# probabilty, s. let's assign it randomly and check later\n",
    "s_sample = random()\n",
    "s_range = list(np.linspace(0,1,100))\n",
    "# results, x_i of n experiments\n",
    "x_i = [bernoulli_experiment(s_sample) for x in range(0,n)]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "# get the likelihood results\n",
    "likelihood = [eval_bernoulli_likelihood_for_all_x_i(x_i, s_) for s_ in s_range]\n",
    "estimated_s = max(list(zip(s_range,likelihood)), key = lambda x:x[1])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "# plot\n",
    "plt.plot(s_range,likelihood)\n",
    "plt.title(\"Max Likelihood as a function of s\")\n",
    "plt.legend([\"actual value of s: s = {:f}\".format(s_sample) + \"\\n\" + \"estimated value of s: {:f}\".format(estimated_s[0])])\n",
    "plt.xlabel(\"s\")\n",
    "plt.ylabel(\"likelihood\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### I've heard people say the words \"log-likelihood\" \n",
    "Me too! Notice how the values of the likelihood (y-axis) are very, very small? Taking the log of both sides of the likelihood equation lets us maximize something a little computationally friendlier than 10^-20 or whatever, and since a logarithm is a monotonic function, it doesn't change the position of the max. Remember, the log of a product is a sum of the logs.\n",
    "\n",
    "$$ L(\\theta) = \\prod_{i = 0}^n P(X_i= x_i; s = \\theta)$$\n",
    "$$ \\log(L(\\theta)) = \\sum_{i = 0}^n \\log(P(X_i= x_i; s = \\theta))$$"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "def eval_bernoulli_loglikelihood_for_all_x_i(x_i, p):\n",
    "    likelihood = 0\n",
    "    for x in x_i:\n",
    "        likelihood += np.log(bernoulli_pmf(x,p))\n",
    "    return likelihood"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "# get the likelihood results\n",
    "loglikelihood = [eval_bernoulli_loglikelihood_for_all_x_i(x_i, s_) for s_ in s_range[1:-1]]\n",
    "log_estimated_s = max(list(zip(s_range,likelihood)), key = lambda x:x[1])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "# plot\n",
    "# notice that the max is in the same place\n",
    "plt.plot(s_range[1:-1],loglikelihood)\n",
    "plt.title(\"Max Log Likelihood as a function of s\")\n",
    "plt.legend([\"actual value of s: s = {:f}\".format(s_sample) + \"\\n\" + \"estimated value of s: {:f}\".format(\n",
    "    log_estimated_s[0])])\n",
    "plt.xlabel(\"s\")\n",
    "plt.ylabel(\"likelihood\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### And finally, there's a way to do the above analytically (rather than numerically)\n",
    "Recall: the maximum of a function occurs when the derivative is equal to zero. \n",
    "Also: It tends to be easier to differentiate the log-likelihood (rather than the likelihood). So let's do that.\n",
    "\n",
    "Write the Bernoulli PMF as: $ s^{x_i}(1−s)^{1−x_i} $, where $x_i$ has one of two values: 0 or 1.\n",
    "\n",
    "$$ \\log(L(\\theta)) = \\sum_{i = 0}^n \\log(P(X_i= x_i; s = \\theta))$$ \n",
    "$$ \\log(L(\\theta)) = \\sum_{i = 0}^n \\log(\\theta^{x_i}(1−\\theta)^{1−x_i}) $$  \n",
    "$$ \\log(L(\\theta)) = (\\sum_{i = 0}^n x_i)\\log(\\theta)+(n−\\sum_{i = 0}^n x_i)\\log(1−\\theta) $$   \n",
    "$$ \\frac{d\\log(L(\\theta)}{d\\theta} = \\frac{d}{d\\theta} (\\sum_{i = 0}^n x_i)\\log(\\theta)+\\frac{d}{d\\theta}(n−\\sum_{i = 0}^n x_i)\\log(1−\\theta) $$\n",
    "$$ \\frac{d\\log(L(\\theta)}{d\\theta} = \\frac{\\sum_{i = 0}^n x_i}{\\theta}+\\frac{n−\\sum_{i = 0}^n x_i}{1−\\theta} $$ \n",
    "\n",
    "Find where the derivative is equal to 0:\n",
    "$$ \\frac{d\\log(L(\\theta)}{d\\theta} = \\frac{\\sum_{i = 0}^n x_i}{\\theta}+\\frac{n−\\sum_{i = 0}^n x_i}{1−\\theta} = 0 \\text { for } \\theta = \\frac{\\sum_{i = 0}^n x_i}{n}$$ "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "# let's see what we would estimate s to be analytically\n",
    "print(\"Estimate of s is: {} \\n (The correct value is {})\".format(sum(x_i)/n,s_sample))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### 3. Estimating the value of $\\tau$\n",
    "\n",
    "Now, remember the Binomial distribution again for a second: $$ P(T = t) = {\\tau\\choose t} s^{t}(1-s)^{(\\tau - t)}$$\n",
    "\n",
    "If we wanted to find the MLE for $\\tau$ analytically, we'd have to take the derivative with respect to $\\tau$, which would be a bloodbath, pretty much. Just think of all of those factorials. But remember that we were able to numerically estimate the value of $s$ earlier, simply by finding where the PMF is maximized. We can do that to estimate the value of $\\tau$."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "# set up the experiment\n",
    "# number of trials\n",
    "n = 100\n",
    "# probabilty, s. i'll use the same \"s\" that we're using for the problem setup\n",
    "print(\"s = {}\".format(s))\n",
    "# now, fix a \"real\" tau\n",
    "# tau could really be anywhere from 0 to infinity, but I'll fix it between 0 and 100\n",
    "tau_sample = ceil(random()*100)\n",
    "# possible tau values\n",
    "possible_tau_range = range(1,101)\n",
    "print(\"The real tau is {}\".format(tau_sample))\n",
    "# the result of n experiments\n",
    "binomial_tau_sample = binom(tau_sample, s)\n",
    "t_i = [binomial_tau_sample.rvs() for x in range(0,n)]\n",
    "#print(\"The results of the esperiments are: {}\".format(t_i))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "# Now, evaluate the binomial distribution likelihood for this collection of results\n",
    "def binomial_loglikelihood(t_i, s, tau):\n",
    "    binomial_distribution = binom(tau,s)\n",
    "    loglikelihood = 0\n",
    "    for t in t_i:\n",
    "        loglikelihood += np.log(binomial_distribution.pmf(t))\n",
    "    return loglikelihood\n",
    "        \n",
    "# there's no sense in testing tau that are lower than the max of t_i\n",
    "likelihood_tau = [(x, binomial_loglikelihood(t_i, s, x)) for x in possible_tau_range if x > max(t_i)] \n",
    "# estimate tau\n",
    "estimated_tau = max(likelihood_tau, key = lambda x:x[1])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "plt.plot([x[0] for x in likelihood_tau], [x[1] for x in likelihood_tau])\n",
    "plt.title(\"Max Likelihood as a function of tau\")\n",
    "plt.legend([\"actual value of tau: tau = {:f}\".format(tau_sample) \n",
    "            + \"\\n\" + \"estimated value of tau: {:f}\".format(estimated_tau[0])])\n",
    "plt.xlabel(\"tau\")\n",
    "plt.ylabel(\"likelihood\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Finally: the question that we actually wanted to answer\n",
    "\n",
    "#### Given $t_1$ and $t_2$ ($t_1 > t_2$), what is the probabilty that $\\tau_1 > \\tau_2$?\n",
    "\n",
    "Now, the more samples we have (the more experiments we run) the better we can estimate $\\tau$ . Unfortunately, in real life, using PowerTrack, we're not going to be able to take 100 different random samples of the total, since PowerTrack sampling is deterministic, but estimating the value of the parameter follows the same idea--find the value of the parameter that is most likely to give the observed result.\n",
    "\n",
    "Now, here's where I'm venturing off the beaten path a little. I'm not that interested in making sure I have the best estimate of $\\tau$, as long as I get the ordering (\"is $\\tau_1$ bigger than $\\tau_2$?\") correct.\n",
    "\n",
    "I figure that the proportion of the likelihoods (likelihood that $\\tau_1$ is greater than $\\tau_2$, divided by the total likelihoods) is probably a good estimate of how likely it is that $\\tau_1$ is actually greater than $\\tau_2$.\n",
    "\n",
    "Also note that I can't possibly test the likelihood of every possible value of $\\tau$, since $\\tau$ could theoretically go to infinity. Instead, I'm making an initial guess at $\\tau$ (simply the sample size divided by the sample proportion), and only testing $\\tau$ values within 3 standard deviations of that value. I'm not sure that's a particularly good way to do it, since the standard deviation is really about the deviation of the sample size, not the deviation of $\\tau$. But it seems okay."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "%%time\n",
    "# Now let's code this up so we can simulate it for various values of s, tau_1, and tau_2\n",
    "# fix t1 and t2--the values that we observed IRL\n",
    "t1 = 100\n",
    "t2 = 95\n",
    "\n",
    "# we aren't going to evaluate this over every possible tau, because that would be wayyyyy to large\n",
    "# let's estimate a window\n",
    "stdv1 = ceil(sqrt(t1*(1-s)))\n",
    "stdv2 = ceil(sqrt(t2*(1-s)))\n",
    "num_stdv = 3\n",
    "window_est_tau1 = [int(max(ceil(t1)-num_stdv*stdv1,0)*(1/s)), int((ceil(t1)+num_stdv*stdv1)*(1/s))]\n",
    "window_est_tau2 = [int(max(ceil(t2)-num_stdv*stdv2,0)*(1/s)), int((ceil(t2)+num_stdv*stdv2)*(1/s))]\n",
    "\n",
    "# calculate the probabilty that binomial distributions with tau1 = j and tau2 = i resulted in samples t1 and t2\n",
    "# use mutiprocessing bc this is kinda slow\n",
    "# function that we are evaluating\n",
    "def evaluate_binomial(j_tau1_i_tau2):\n",
    "    binomial_tau1 = binom(j_tau1_i_tau2[0],s)\n",
    "    binomial_tau2 = binom(j_tau1_i_tau2[1],s)\n",
    "    return (j_tau1_i_tau2, binomial_tau1.pmf(t1)*binomial_tau2.pmf(t2))\n",
    "# create the processes\n",
    "pool = Pool(processes=4)\n",
    "tau2_and_tau1_list = []\n",
    "probabilities_tau2_and_tau1_list = []\n",
    "for j_tau1 in range(window_est_tau1[0],window_est_tau1[1]):\n",
    "    for i_tau2 in range(window_est_tau2[0],window_est_tau2[1]):\n",
    "        tau2_and_tau1_list.append((j_tau1,i_tau2))\n",
    "# get the results\n",
    "probabilities_tau2_and_tau1_list = pool.map(evaluate_binomial, tau2_and_tau1_list)\n",
    "probabilities_tau2_and_tau1 = np.zeros((window_est_tau1[1],window_est_tau2[1]))\n",
    "for probabilty_calcd in probabilities_tau2_and_tau1_list: \n",
    "     probabilities_tau2_and_tau1[probabilty_calcd[0]] = probabilty_calcd[1]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "figure, ax = plt.subplots(1,1,figsize = (6,12))\n",
    "plt.imshow(probabilities_tau2_and_tau1, cmap='Blues', interpolation='nearest', aspect = 'equal', origin = 'lower')\n",
    "plt.plot([0,window_est_tau2[1]],[0,window_est_tau2[1]],'--',color = 'gray')\n",
    "ax.fill_between([0,window_est_tau2[1]],0,window_est_tau1[0],alpha = 1,color = 'white')\n",
    "ax.fill_between([0,window_est_tau2[0]],window_est_tau1[0],window_est_tau1[1],alpha = 1,color = 'white')\n",
    "_ = plt.xlim([0,window_est_tau2[1]])\n",
    "_ = plt.ylim([0,window_est_tau1[1]])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "print(\"Estimate of the probabilty that tau_1 < tau_2 given that t_1 = {} and t_2 = {}: {}\".format(\n",
    "    t1,t2,np.triu(probabilities_tau2_and_tau1).sum()/probabilities_tau2_and_tau1.sum()))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### That's all folks!"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.5.1"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
